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Abstract 

We consider the problem of calibration and the GREG method as 
suggested and studied in Deville and Sarndal (1992). We show that 
a GREG type estimator is typically not minimal variance unbiased 
estimator even asymptotically. We suggest a similar estimator which 
is unbiased but is asymptotically with a minimal variance. 

1 Introduction 

The purpose of this note is to examine the popular calibration techniques, 
suggested, e.g., in Deville and Sarndal (1992), or Sarndal et.al. (1992) Chap- 
ter 6.4, those calibrated estimators are also known as GREG (the general 
regression estimator). Our development and criterion are elementary. We 
are interested in finding a minimum variance linear estimator. This leads 
lead to a very similar to the GREG estimator in form estimator, but with 
different constants. The difference between these two approaches as demon- 
strated in what follows. This demonstration is the main purpose of this 
note. 

First we review the above mentioned calibration GREG approach, follow- 
ing Deville and Sarndal (1992). Consider a finite population U = {1, N}, 
and a sample S, S C U. Denote 7r,; = P(S 3 i), = P(S D {i, j}). Let 
(yi, Xi), be quantities associtaed with item i, i G U, here yi is a scalar and Xi 
is a vector. The quantity of interest is ty = ^2i/Ui, while x; are considered 
as covariates. Suppose the total tx = Sc/ X i i s known, w.l.o.g., tx = 0. 
Then, it is suggested to utilize that information about the totals through 
the following reasoning. 
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Define i x = EieS^/ 71 "* = Ylies^ and ^ = Eies^/ 71 "* = EieS^' 
where di = l/7Tj. The above are the Horowitz-Thompson estimators, hence 
we have Et y = ty and Etx = tx = 0. However, the value of tx is typically 
different than 0, which is unfortunate. 

It is suggested to find "better" or "improved" weights i € S ("better" 
than di) and estimate t y by EieS 10 *^ - ^ ne h eur istic derivation of the 
improved (random) weights Wi, i £ S is the following. Given S denote by 
w the vector of improved weights. Then w is defined as the solution of the 
program: 

min VVwj - di) 2 /diqi 

US — ' 

les (1) 

s.t.^UiXi = 0; 

ieS 

here, the % are selected parameters, which, as a default, suggested to be 
set to 1. The resulting estimator denoted t y \ x , may be written as: t y \ x = 
Y, w kVk- 

The solution of (2) is simple. Using a vector of Lagrange multipliers A 
we can find that 

Wi = (1 + X T qiXi)di. 
where A is such that the constraint is satisfied, namely 

A = - 9«^ x i x I)) X] 

ies ies 

= -H~ l i x , 
where H q = Ei G ,g 9i<^ x i x i- Hence 

*2/|x = iy — h ix, 

where f3 = H' 1 YlieS diqiViX-i- 

In the following we consider weights qi = 1, and denote then H q simply 

by H. 

Note that for any (pre-determined) (3, ty — f3 T tx is an unbiased estimator 
of ty. Hence we may look for the minimal variance estimator of this type. 
One may restrict himself to a linear estimator (linear in Fj, i £ 5). That 
is, an estimator of the form E - "^*' with a sequence of weight that could 
simultaneously be used for getting an estimator for any functional. Still 
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one may look for such weights that would ensure that the estimator has a 
minimal variance. We will argue that the weights given by (2) are generally 
speaking, far from being optimal. 

Similar problem were discussed in Bickel, Klaassen, Ritov, and Wellenr 
(1998) in the context of i.i.d. observations and semiparametric models. The 
question there, was defined as the semiparametric efficient estimation of 
parameter, when other parameters are known (e.g., estimation of the joint 
distribution, when the marginal distributions are known). Our solution is 
similar to the examples analyzed in that literature. 

2 Minimum variance linear unbiased estimator 

Consider estimators of ty which are linear in ty and tx, i.e., of the form 



where (3 is non-random. The above class is unbiased since Etx = 0. Con- 
sider the estimator T((3 a ) in the above class with minimal variance. Clearly, 



where Si and T,i i are the variance-covariance matrix of ty, and the 

x x i Y » 

covariance vector of tx and ty, respectively. 

First we argue that (3 is not a consistent estimator of /3 . The following 

example, while being extreme, is enlightening. 

Example 2.1 Consider a population divided into two stratas of equal sizes. 
For each i 6 U there is a corresponding yi and Xi, i.e., we have one dimen- 
sional covariates. Suppose we randomly sample n units from each strata, 
i.e., a total of M = 2n where 7Tj = M/N. 

Assume the mean of Xi in stratum 1 is -1 and their mean in stratum 2 
is 1. The variance of x\ within each stratum is a 2 . Now assume in stratum 
1, yi = —1, while in stratum 2, yi = 1. Therefore, Var(ty) = 0, and hence 
(3 = 0. In fact, the optimal estimator in this case is simply ty = 0, on 
the other hand, /3 = H" 1 Ylies ^iVi^i (1 + °" 2 )~ 1 - Asymptotically (as 
n — y oo ) the GREG estimator T0) w -(1 + cr 2 ) _1 tx has therefor variance 
of order N 2 /n, while the optimal estimator for this case is exact with zero 
variance. 

The difference between (3 Q and (3 would be large, when there is more 
than a scale difference between the second moments of ty , tx and of those 



T(/3) = ty - 0% x 
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of Y, X. This precludes the simple random sample, but is typical for other 
sampling scheme. The following example is less extreme than the first one, 
but describes a practical situation. 

Example 2.2 Suppose we sample clusters, the units in the sample are in- 
dexed by j and k, where all units in cluster j, refer to the same central value 
Sj, and satisfy Xjk = Sj + ejk and yjk = Sj + jVjk, where the correlation 
between and Vj\. is 0. Suppose that K units are sampled in each cluster. 
It is clear that if the number of clusters is large, then with obvious notation: 
(3 = + T, e /K) while $ -A + E e ). In the simple case where 

S s = S e = Sjy, if K = 5 then the estimator with /3q would have a variance 
smaller by approximately 25% than the variance of the estimator using j3. 
The difference is approximately 50% when K = 10. 

In order to estimate the £| and ^ x , we may use the classical vari- 
ance estimators for Horovitz-Thompson estimator, see, e.g., Cochran (1977) 
or Sharon (1999). Those estimators are typically given in the literature for 
one dimensional variance rather than to a covariance matrix, however the 
same reasoning applies. Since tx = 0, 



1,3 es 



— —yix-j 

i,j£U 1 3 



i,jeu 



TTiTTj 



Similarly, 



TTiTTj 



Hence, the following are unbiased estimators: 

K,ir= E — (5t- c )^> Vc - 



i,jeS 



< 3 > 



ijeS 3 3 



We assume that we consider a sequence of populations and designs such 
that the estimators in (3) are consistent. Typically, taking c = 1 in (3), 
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would suffice to make most of the terms of lower order than the diagonal. 
In a simple random sample without replacement, c = (n — l)N/n(N — 1) 
leaves only the diagonal. 

Theorem 2.1 Let f3 Q = S^S^ where the terms in the RHS are given 
by (3) with a given c. Then 

T (Po) = ^WiVi, 

where 

^--^ t :I E-(-- c K 

% j&S 3 3 

Thus the weights are a function ofxi, iti, Ttij, i,j G S only. In particular 

3 Examining (3 under linear model assumptions. 

In this section we will examine the rational in the estimator (3 under the 
convenient and (too) often assumed super-population model under which 
Yfc = /3X/; + efc, where Ee^ = and for simplicity assume that e^, k 6 U 
have equal variance. 

Under this model it is easy to check that T,xy = P^xx, and Y,t x j Y = 
/3St x . Hence (3 is a possible estimator of (3 Q = j3. However, if this model 
is assumed, it is still not clear why (3 should be used. We have here a 
standard regression problem. Elementary regression theory (namely the 
Gauss-Markov Theorem) implies that the optimal estimator is not f3, but 
the standard un- weighted linear regression of Y\, . . . , Y n on xi, . . . , x n . 

It might be argued that in fact we are taking the linear model super- 
population assumption with a grain of salt, and thus we are using the esti- 
mator for 

(3 = arg mm ^ (y { - b T x) 2 , (4) 

ieU 

which is defined under no linear model assumptions. However, since in this 
case we have no interest in that population parameter per se, but just in a 
tool for construction a good estimator for ty, than (3 Q should be our target. 

To summarize. If we are interested in the super-population parameter, 
than is not efficient, and if we are interested in good estimator of ty, than 
(3 is not consistent under complex sampling schemes. 
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4 A partial knowledge of tx 



In many cases tx is not really known. However, it might be that there is an 
additional independent sample with information about tx but not about ty. 
Thus we have three unbiased estimators t Y ,t x based on one sample, and 
based on another independent sample. The best estimator of ty would 
be based on these three. Following the same argument as before we should 
consider estimator of the form 

iy-(3\i X -i X ). 

Note that this estimator yields an unbiased estimate of ty for any (3. The 
optimal value, however, is given by 

Note, that if t x is based on the all universe U, then Sj2 = 0, and /3 D 2 = /So- 
Even more generally, we can consider a situation in which x is measured 
for all units in the a super sample S2, S\ C 52 C U, while the y values 
are measured only for units in the smaller sample Si. For example, y is 
measured only for one unit in a cluster, while the x is measured for all units. 
Let Sx = ty — t 2 x . It may be natural to assume that 5x is correlated with 
ty while having a mean 0. We consider the natural extension ty — fil^x, 
with /3 o3 = -ErjE^. 

Example 4.1 Consider the super-population model in which it is assumed 
that = /3xj : k + j = !)•••) M, k = 1,...,K where Ej^ are i.i.d., 
independent of Xj^, while xy^y and xj^ are independent if j 7^ f, and 
have correlation p if j = j' and k 7^ k' . Let Var(x Ji fc) = a 2 . Consider the 
sample C C {1, . . . , M} of n clusters. Suppose that for each j € C, Xj^, 
k = 1, ... ,K are obtained, while only yj t i is measured, assume also that 
M S> n. The universe size is = MK. Hence, we assume for simplicity a 
simple random sample (with replacement) of clusters. Then 

N K 
jec k=i 

It is easy to verify 

Var(tl) = — {3 2 a 2 
n 
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Var(5 x ) 



N 2 K - 1 



(1 - p)a 2 



cov (5 xjy) 



n K 
N 2 K- 1 



(l-p)/^ 2 . 



K 



Hence 



Var(^-/3 o3 ^) 
Var(f^) 



= 1 - 



K- 1 



A' 



(1-P). 



The efficiency of the scheme increases as K increases and p decreases. Note 
that the case of a simple random sample of units in which the y value is 
measured only for a small random sub-sample, corresponds to p = 0. 
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